net, tests, stuntime: Wait for VMI affinity reconciliation before migration#5347
net, tests, stuntime: Wait for VMI affinity reconciliation before migration#5347Anatw wants to merge 4 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR adds retry-based waiting for VMI affinity reconciliation in ChangesAffinity-aware migration stuntime checks
Estimated code review effort: 3 (Moderate) | ~20 minutes Possibly related PRs
Suggested reviewers: 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsLinked repositories: Your configuration references 1 linked repositories, but your current plan allows 0. Analyzed ``, skipped Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Report bugs in Issues Welcome! 🎉This pull request will be automatically processed with the following features: 🔄 Automatic Actions
📋 Available CommandsPR Status Management
Review & Approval
Testing & Validation
Container Operations
Cherry-pick Operations
Branch Management
Label Management
✅ Merge RequirementsThis PR will be automatically approved when the following conditions are met:
📊 Review ProcessApprovers and ReviewersApprovers:
Reviewers:
Available Labels
AI Features
Security Checks
💡 Tips
For more information, please refer to the project documentation or contact the maintainers. |
|
/wip |
|
/build-and-push-container |
|
New container for quay.io/openshift-cnv/openshift-virtualization-tests:pr-5347 published |
d503c68 to
31df3e8
Compare
| expected_affinity = ( | ||
| asdict(obj=template_affinity, dict_factory=self._filter_out_none_values) if template_affinity else None | ||
| ) | ||
| for sample in TimeoutSampler( |
There was a problem hiding this comment.
Consider using a helper with @retry
| has_stuntime_rules = any( | ||
| expr.get("key", "").startswith("stuntime.") | ||
| for rule in pod_affinity[stale_type].get("requiredDuringSchedulingIgnoredDuringExecution", []) | ||
| for expr in rule.get("labelSelector", {}).get("matchExpressions", []) | ||
| ) |
There was a problem hiding this comment.
The rules are a bit unreadable
Consider splitting it to vars
There was a problem hiding this comment.
Ack, let me know if this is better or I should split further.
|
/retest all Auto-triggered: Files in this PR were modified by merged PR #5285. Overlapping fileslibs/vm/vm.py |
56a1e16 to
f5ce311
Compare
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py`:
- Around line 83-86: The cleanup for the `active_ping` session in
`measure_stuntime()` is only happening after the Jira-gated
`assert_affinity_after_migration()` path, so a failure there can leave the ping
running. Move the ping stop/report logic out of the normal flow and into a
`finally` block or fixture teardown around the migration/assertion sequence in
the stuntime migration tests so it always runs, even when
`assert_affinity_after_migration()` fails. Apply the same pattern in the related
`test_migration_stuntime` scenarios that use `wait_for_vmi_affinity()`,
`migrate_vm_and_verify()`, and `measure_stuntime()`.
In `@tests/network/libs/stuntime.py`:
- Around line 158-160: The membership check in the pod affinity assertion can
crash when vm.vmi.virt_launcher_pod.instance.to_dict()["spec"].get("affinity")
returns None, so update the diagnostic logic in this helper to handle the
missing-affinity case before checking expected_type in pod_affinity. Use a
pytest assertion in stuntime.py that explicitly verifies affinity is present and
then checks for the expected and stale types, with a clear failure message that
explains which affinity state was missing or unexpected.
In `@tests/network/localnet/migration_stuntime/test_migration_stuntime.py`:
- Around line 83-86: The migration stuntime tests are asserting affinity before
the ping session is guaranteed to be stopped, so a failure can leave active_ping
running and leak state into later migrations. Update the affected test blocks
around migrate_vm_and_verify(), assert_affinity_after_migration(), and
measure_stuntime() so ping cleanup is unconditional, using a finally block or
fixture teardown to stop/report the ping even if the affinity assertion fails.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c139bd34-f5f3-4fab-aba9-96a051950522
📒 Files selected for processing (4)
libs/vm/vm.pytests/network/l2_bridge/migration_stuntime/test_migration_stuntime.pytests/network/libs/stuntime.pytests/network/localnet/migration_stuntime/test_migration_stuntime.py
azhivovk
left a comment
There was a problem hiding this comment.
Thanks
I have a few questions
|
/verified Tested on bm03-tlv2 uv run pytest -s --skip-deprecated-api-test --pdbcls=IPython.terminal.debugger:TerminalPdb --tc-format=python --cluster-sanity-skip-storage-check -m "tier3" tests/network/localnet/migration_stuntime/test_migration_stuntime.py tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py |
Test execution plan request detailsCRITICAL: You MUST post an inline review comment on the first changed line of the first file. As an expert software testing engineer, analyze all modified files in this PR and create a targeted test execution plan. Analysis Requirements:
Output rules: Your deliverable: Test Execution Plan
Use these formats:
Real test commands (MANDATORY when changes affect session/runtime code): When the affected code runs at session/collection time (conftest fixtures, pytest plugins,
Example output for a session-startup fix: Guidelines:
Hardware-Related Checks (SR-IOV, GPU, DPDK): When PR modifies fixtures for hardware-specific resources:
CRITICAL WORKFLOW COMPLETION RULES: When responding to this test execution plan request, you MUST follow these rules EXACTLY:
ABSOLUTE PROHIBITIONS (violating these creates empty/meaningless reviews):
Remember: The pull request review is visible to users. Additional comments are redundant noise. CRITICAL — Verification rules for smoke and gating decisions: |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
repo="RedHatQE/openshift-virtualization-tests"
pr="5347"
echo "== PR head SHA =="
head_sha=$(gh pr view "$pr" --repo "$repo" --json headRefOid --jq '.headRefOid')
echo "$head_sha"
echo
echo "== Changed files in order =="
mapfile -t pr_files < <(gh pr view "$pr" --repo "$repo" --json files --jq '.files[].path')
printf '%s\n' "${pr_files[@]}"
first_file="${pr_files[0]}"
echo
echo "== First changed file =="
echo "$first_file"
echo
echo "== Patch for first changed file (head) =="
gh api "repos/$repo/pulls/$pr/files?per_page=100" --jq '.[] | select(.filename=="'"$first_file"'") | .patch'
echo
echo "== Current diff for changed files =="
for f in "${pr_files[@]}"; do
echo "--- $f ---"
git diff --unified=3 "origin/$(git rev-parse --abbrev-ref HEAD)~1" -- "$f" 2>/dev/null || git diff --unified=3 HEAD~1 -- "$f" || true
done
echo
echo "== Outline for candidate library/shared files =="
for f in libs/vm/vm.py tests/network/libs/stuntime.py; do
if [ -f "$f" ]; then
echo "--- $f ---"
ast-grep outline "$f" --view expanded | sed -n '1,240p'
fi
done
echo
echo "== Modified function/method definitions in libs/utilities/shared files =="
for f in libs/vm/vm.py tests/network/libs/stuntime.py; do
if [ -f "$f" ]; then
echo "--- $f ---"
git diff --unified=0 HEAD~1 -- "$f" | rg '^[+-]\s*(def |class )' || true
fi
done
echo
echo "== Direct callers/imports in tests and conftest =="
rg -n -C2 '\bwait_for_vmi_affinity\s*\(|\bassert_affinity_after_migration\s*\(|\bset_template_affinity\s*\(' tests tests/**/conftest.py || true
echo
echo "== Same-file callers in changed library/shared files =="
for f in libs/vm/vm.py tests/network/libs/stuntime.py; do
if [ -f "$f" ]; then
echo "--- $f ---"
rg -n -C2 'wait_for_vmi_affinity|assert_affinity_after_migration|_assert_vmi_affinity|_filter_out_none_values' "$f" || true
fi
done
echo
echo "== Changed test files: markers and test defs =="
for f in tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py tests/network/localnet/migration_stuntime/test_migration_stuntime.py; do
if [ -f "$f" ]; then
echo "--- $f markers ---"
rg -n '`@pytest`\.mark\.(smoke|gating|tier[0-9]+|polarion|parametrize)' "$f" || true
echo "--- $f tests ---"
rg -n '^\s*def test_' "$f" || true
fi
done
echo
echo "== Smoke-marked files =="
rg -l '`@pytest`\.mark\.smoke' tests || true
echo
echo "== Gating-marked files =="
rg -l '`@pytest`\.mark\.gating' tests || true
echo
echo "== Parent conftest references to modified symbols =="
rg -n -C2 'wait_for_vmi_affinity|assert_affinity_after_migration|set_template_affinity' tests/**/conftest.py tests/conftest.py || true
echo
echo "== First changed line in first changed file =="
first_line=$(
gh api "repos/$repo/pulls/$pr/files?per_page=100" --jq '.[] | select(.filename=="'"$first_file"'") | .patch' \
| python - <<'PY'
import sys,re
patch=sys.stdin.read().splitlines()
cur=None
for line in patch:
m=re.match(r'^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@', line)
if m:
cur=int(m.group(1))
continue
if cur is None:
continue
if line.startswith('+') and not line.startswith('+++'):
print(cur)
break
if line.startswith('-') and not line.startswith('---'):
continue
cur += 1
PY
)
echo "$first_line"
body="$(cat <<'EOF'
**Test Execution Plan**
- **Run smoke tests: False** — no verified dependency path from any `@pytest.mark.smoke` test or parent `conftest.py` to `libs/vm/vm.py::BaseVirtualMachine.wait_for_vmi_affinity()` or `tests/network/libs/stuntime.py::assert_affinity_after_migration()`.
- **Run gating tests: False** — no verified dependency path from any `@pytest.mark.gating` test to the changed symbols.
- **Affected tests to run**
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)
These are the concrete affected paths because the changed tests now call `BaseVirtualMachine.wait_for_vmi_affinity()` directly after `set_template_affinity()`, and they conditionally call `tests/network/libs/stuntime.py::assert_affinity_after_migration()` after `migrate_vm_and_verify()`.
**Real tests (cluster required)**
Error path (the fix):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/localnet/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_off_server_node or test_server_migrates_off_client_node'`
Expected: after `set_template_affinity()`, the VMI affinity converges before migration, the migrated VM lands with matching anti-affinity rules, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`.
Happy path (regression):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_to_server_node or test_server_migrates_to_client_node'`
Expected: affinity-targeted migrations still succeed, the target pod reflects the expected affinity type when `CNV-90576` is open, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`.
EOF
)"
echo
echo "== Attempt inline review comment posting =="
set +e
inline_ok=0
for attempt in 1 2 3; do
echo "-- inline attempt $attempt --"
gh api \
--method POST \
"repos/$repo/pulls/$pr/comments" \
-f body="$body" \
-f commit_id="$head_sha" \
-f path="$first_file" \
-F line="$first_line" \
-f side="RIGHT"
rc=$?
echo "inline_rc=$rc"
if [ $rc -eq 0 ]; then
inline_ok=1
break
fi
sleep 2
done
if [ $inline_ok -ne 1 ]; then
echo
echo "== Inline failed; attempt fallback regular PR comment =="
gh api \
--method POST \
"repos/$repo/issues/$pr/comments" \
-f body="$body"
echo "fallback_rc=$?"
fi
set -eLength of output: 7215 🏁 Script executed: #!/bin/bash
set -euo pipefail
repo="RedHatQE/openshift-virtualization-tests"
pr="5347"
tmpdir="$(mktemp -d)"
trap 'rm -rf "$tmpdir"' EXIT
echo "== PR head SHA =="
head_sha="$(gh pr view "$pr" --repo "$repo" --json headRefOid --jq '.headRefOid')"
echo "$head_sha"
echo
echo "== Changed files in order =="
gh pr view "$pr" --repo "$repo" --json files --jq '.files[].path' > "$tmpdir/pr_files.txt"
cat "$tmpdir/pr_files.txt"
first_file="$(sed -n '1p' "$tmpdir/pr_files.txt")"
echo
echo "== First changed file =="
echo "$first_file"
echo
echo "== Patch for first changed file (head) =="
gh api "repos/$repo/pulls/$pr/files?per_page=100" --jq '.[] | select(.filename=="'"$first_file"'") | .patch' > "$tmpdir/first_patch.txt"
cat "$tmpdir/first_patch.txt"
echo
echo "== Current diff for changed files =="
while IFS= read -r f; do
echo "--- $f ---"
git diff --unified=3 HEAD~1 -- "$f" || true
done < "$tmpdir/pr_files.txt"
echo
echo "== Outline for candidate library/shared files =="
for f in libs/vm/vm.py tests/network/libs/stuntime.py; do
if [ -f "$f" ]; then
echo "--- $f ---"
ast-grep outline "$f" --view expanded | sed -n '1,240p'
fi
done
echo
echo "== Modified function/method definitions in libs/utilities/shared files =="
for f in libs/vm/vm.py tests/network/libs/stuntime.py; do
if [ -f "$f" ]; then
echo "--- $f ---"
git diff --unified=0 HEAD~1 -- "$f" | rg '^[+-]\s*(def |class )' || true
fi
done
echo
echo "== Direct callers/imports in tests and conftest =="
rg -n -C2 '\bwait_for_vmi_affinity\s*\(|\bassert_affinity_after_migration\s*\(|\bset_template_affinity\s*\(' tests tests/**/conftest.py || true
echo
echo "== Same-file callers in changed library/shared files =="
for f in libs/vm/vm.py tests/network/libs/stuntime.py; do
if [ -f "$f" ]; then
echo "--- $f ---"
rg -n -C2 'wait_for_vmi_affinity|assert_affinity_after_migration|_assert_vmi_affinity|_filter_out_none_values' "$f" || true
fi
done
echo
echo "== Changed test files: markers and test defs =="
for f in tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py tests/network/localnet/migration_stuntime/test_migration_stuntime.py; do
if [ -f "$f" ]; then
echo "--- $f markers ---"
rg -n '`@pytest`\.mark\.(smoke|gating|tier[0-9]+|polarion|parametrize)' "$f" || true
echo "--- $f tests ---"
rg -n '^\s*def test_' "$f" || true
fi
done
echo
echo "== Smoke-marked files =="
rg -l '`@pytest`\.mark\.smoke' tests || true
echo
echo "== Gating-marked files =="
rg -l '`@pytest`\.mark\.gating' tests || true
echo
echo "== Parent conftest references to modified symbols =="
rg -n -C2 'wait_for_vmi_affinity|assert_affinity_after_migration|set_template_affinity' tests/**/conftest.py tests/conftest.py || true
echo
echo "== First changed line in first changed file =="
python - "$tmpdir/first_patch.txt" <<'PY' > "$tmpdir/first_line.txt"
import re, sys
patch = open(sys.argv[1], encoding="utf-8").read().splitlines()
cur = None
for line in patch:
m = re.match(r'^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@', line)
if m:
cur = int(m.group(1))
continue
if cur is None:
continue
if line.startswith('+') and not line.startswith('+++'):
print(cur)
break
if line.startswith('-') and not line.startswith('---'):
continue
cur += 1
PY
first_line="$(cat "$tmpdir/first_line.txt")"
echo "$first_line"
cat > "$tmpdir/body.md" <<'EOF'
**Test Execution Plan**
- **Run smoke tests: False** — no verified dependency path from any `@pytest.mark.smoke` test or parent `conftest.py` to `libs/vm/vm.py::BaseVirtualMachine.wait_for_vmi_affinity()` or `tests/network/libs/stuntime.py::assert_affinity_after_migration()`.
- **Run gating tests: False** — no verified dependency path from any `@pytest.mark.gating` test to the changed symbols.
- **Affected tests to run**
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)
These are the concrete affected paths because the changed tests now call `BaseVirtualMachine.wait_for_vmi_affinity()` directly after `set_template_affinity()`, and they conditionally call `tests/network/libs/stuntime.py::assert_affinity_after_migration()` after `migrate_vm_and_verify()`.
**Real tests (cluster required)**
Error path (the fix):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/localnet/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_off_server_node or test_server_migrates_off_client_node'`
Expected: after `set_template_affinity()`, the VMI affinity converges before migration, the migrated VM lands with matching anti-affinity rules, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`.
Happy path (regression):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_to_server_node or test_server_migrates_to_client_node'`
Expected: affinity-targeted migrations still succeed, the target pod reflects the expected affinity type when `CNV-90576` is open, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`.
EOF
body="$(cat "$tmpdir/body.md")"
echo
echo "== Attempt inline review comment posting =="
set +e
inline_ok=0
for attempt in 1 2 3; do
echo "-- inline attempt $attempt --"
gh api \
--method POST \
"repos/$repo/pulls/$pr/comments" \
-f body="$body" \
-f commit_id="$head_sha" \
-f path="$first_file" \
-F line="$first_line" \
-f side="RIGHT"
rc=$?
echo "inline_rc=$rc"
if [ "$rc" -eq 0 ]; then
inline_ok=1
break
fi
sleep 2
done
if [ "$inline_ok" -ne 1 ]; then
echo
echo "== Inline failed; attempt fallback regular PR comment =="
gh api \
--method POST \
"repos/$repo/issues/$pr/comments" \
-f body="$body"
echo "fallback_rc=$?"
fi
set -eLength of output: 46117
CRITICAL: I retried the required workflow on the latest PR head ( Why this matters: your workflow requires a non-blocking inline PR comment on Posting attempts
Here is the exact comment body to post manually on **Test Execution Plan**
- **Run smoke tests: False** — no verified dependency path from any `@pytest.mark.smoke` test or parent `conftest.py` to `libs/vm/vm.py::BaseVirtualMachine.wait_for_vmi_affinity()` or `tests/network/libs/stuntime.py::assert_affinity_after_migration()`.
- **Run gating tests: False** — no verified dependency path from any `@pytest.mark.gating` test to the changed symbols.
- **Affected tests to run**
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)
These are the concrete affected paths because the changed tests now call `BaseVirtualMachine.wait_for_vmi_affinity()` directly after `set_template_affinity()`, and they conditionally call `tests/network/libs/stuntime.py::assert_affinity_after_migration()` after `migrate_vm_and_verify()`.
**Real tests (cluster required)**
Error path (the fix):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/localnet/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_off_server_node or test_server_migrates_off_client_node'`
Expected: after `set_template_affinity()`, the VMI affinity converges before migration, the migrated VM lands with matching anti-affinity rules, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`.
Happy path (regression):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_to_server_node or test_server_migrates_to_client_node'`
Expected: affinity-targeted migrations still succeed, the target pod reflects the expected affinity type when `CNV-90576` is open, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`. |
azhivovk
left a comment
There was a problem hiding this comment.
Hopefully it's okay to apply W/A for the stuntime measruements
| self._assert_vmi_affinity(expected_affinity=expected_affinity) | ||
|
|
||
| @retry(wait_timeout=10, sleep=1) | ||
| def _assert_vmi_affinity(self, expected_affinity: dict[str, Any] | None) -> bool: |
There was a problem hiding this comment.
NIT:
Since we use assert for tests, consider renaming to expected_vmi_affinity
There was a problem hiding this comment.
+1 from me, let not mix up concerns. assert is about test itself, if you have to check some state in precondition/step, please use any other way/naming, not assert.
There was a problem hiding this comment.
Ack, renamed to _expected_vmi_affinity
| @retry(wait_timeout=10, sleep=1) | ||
| def _assert_vmi_affinity(self, expected_affinity: dict[str, Any] | None) -> bool: | ||
| vmi_affinity = self.vmi.instance.to_dict()["spec"].get("affinity") | ||
| assert vmi_affinity == expected_affinity, ( |
There was a problem hiding this comment.
see thread. assert is a test concern, not library. Raise a clear exception instead.
| return stuntime | ||
|
|
||
|
|
||
| def assert_affinity_after_migration(vm: BaseVirtualMachine, expected_type: str) -> None: |
There was a problem hiding this comment.
if this is not a test itself, rename the helper and don't use assert. I bet we don't test affinity after migration, hence it does not deserve assertion. Otherwise, it is a mess.
There was a problem hiding this comment.
This function is now redundant - removed entirely.
| localnet_stuntime_client_vm.set_template_affinity(affinity=new_pod_anti_affinity(label=SERVER_VM_LABEL)) | ||
| localnet_stuntime_client_vm.wait_for_vmi_affinity() | ||
| migrate_vm_and_verify(vm=localnet_stuntime_client_vm, client=admin_client) | ||
| if is_jira_open(jira_id="CNV-90576"): |
There was a problem hiding this comment.
btw don't we need to check affinity even without this bug? I mean, perhaps the direction is right regardless the bug -> checking jira is not needed?
There was a problem hiding this comment.
Firstly - the bug is an automation bug, so the product bug this code is dependent on is not relevant anymore.
To your point - I was conflicting about this too, but I think that we shouldn't test this regardless - we expect the product to behave as it should - for example, if we set an NNCP and it is reported as being configured successfully - we assume it is not lying - we don't check the node to make sure the bridge is actually there.
But I'm not 100% convinced, I'm curious on @EdDev take on this.
There was a problem hiding this comment.
I do not understand what is asked here.
I can just say that updating the affinity in order to control to where the VM will migrate to is a test thing. I do not expect users to use it, but they may... in any case we do not test it in this context.
There was a problem hiding this comment.
Thanks @EdDev , the question is whether we want to verify the affinity post migration, or we assume it is what we asked for. The bug exposed that migration can end successfully with stale affinity - making our stuntime measurements incorrect.
| patches = {self: {"spec": {"template": {"spec": {"affinity": template_affinity}}}}} | ||
| ResourceEditor(patches=patches).update() | ||
|
|
||
| def wait_for_vmi_affinity(self) -> None: |
There was a problem hiding this comment.
per my understanding, the API contract of "set affinity" should mean "affinity is applied when I return." Hence the wait operation should be a part if set_template_affinity. Otherwise, every caller must remember the wait, and future callers may forget. Consider integrating the reconciliation wait into set_template_affinity itself.
virt-controller and VMIM controller reconcile independently — if migration is triggered before the VM controller propagates the template affinity to the VMI, the target pod is created with stale scheduling rules. Assisted-by: Claude <noreply@anthropic.com> Signed-off-by: Anat Wax <awax@redhat.com>
Signed-off-by: Anat Wax <awax@redhat.com> Assisted-by: Claude <noreply@anthropic.com>
for more information, see https://pre-commit.ci
|
|
/verified Tested on bm03-tlv2 uv run pytest -s --skip-deprecated-api-test --pdbcls=IPython.terminal.debugger:TerminalPdb --tc-format=python --cluster-sanity-skip-storage-check -m "tier3" tests/network/localnet/migration_stuntime/test_migration_stuntime.py tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py |
Test execution plan request detailsCRITICAL: You MUST post an inline review comment on the first changed line of the first file. As an expert software testing engineer, analyze all modified files in this PR and create a targeted test execution plan. Analysis Requirements:
Output rules: Your deliverable: Test Execution Plan
Use these formats:
Real test commands (MANDATORY when changes affect session/runtime code): When the affected code runs at session/collection time (conftest fixtures, pytest plugins,
Example output for a session-startup fix: Guidelines:
Hardware-Related Checks (SR-IOV, GPU, DPDK): When PR modifies fixtures for hardware-specific resources:
CRITICAL WORKFLOW COMPLETION RULES: When responding to this test execution plan request, you MUST follow these rules EXACTLY:
ABSOLUTE PROHIBITIONS (violating these creates empty/meaningless reviews):
Remember: The pull request review is visible to users. Additional comments are redundant noise. CRITICAL — Verification rules for smoke and gating decisions: |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
repo="RedHatQE/openshift-virtualization-tests"
pr="5347"
tmpdir="$(mktemp -d)"
trap 'rm -rf "$tmpdir"' EXIT
echo "== PR head SHA =="
head_sha="$(gh pr view "$pr" --repo "$repo" --json headRefOid --jq '.headRefOid')"
echo "$head_sha"
echo
echo "== Changed files in order =="
gh pr view "$pr" --repo "$repo" --json files --jq '.files[].path' > "$tmpdir/pr_files.txt"
cat "$tmpdir/pr_files.txt"
first_file="$(sed -n '1p' "$tmpdir/pr_files.txt")"
echo
echo "== First changed file: $first_file =="
echo
echo "== Patch for first changed file =="
gh api "repos/$repo/pulls/$pr/files?per_page=100" --jq '.[] | select(.filename=="'"$first_file"'") | .patch' > "$tmpdir/first_patch.txt"
cat "$tmpdir/first_patch.txt"
echo
echo "== Current libs/vm/vm.py content around affinity code =="
rg -n -C5 'def set_template_affinity|def wait_for_vmi_affinity|def _assert_vmi_affinity|_expected_vmi_affinity' libs/vm/vm.py || true
echo
echo "== Full diff for all changed files vs previous merge base =="
while IFS= read -r f; do
echo "--- $f ---"
git diff --unified=3 HEAD~1 -- "$f" || true
done < "$tmpdir/pr_files.txt"
echo
echo "== First changed line number =="
python - "$tmpdir/first_patch.txt" <<'PY'
import re, sys
patch = open(sys.argv[1], encoding="utf-8").read().splitlines()
cur = None
for line in patch:
m = re.match(r'^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@', line)
if m:
cur = int(m.group(1))
continue
if cur is None:
continue
if line.startswith('+') and not line.startswith('+++'):
print(cur)
break
if line.startswith('-') and not line.startswith('---'):
continue
cur += 1
PY
echo
echo "== Attempt inline comment on updated head =="
first_line=$(python - "$tmpdir/first_patch.txt" <<'PY'
import re, sys
patch = open(sys.argv[1], encoding="utf-8").read().splitlines()
cur = None
for line in patch:
m = re.match(r'^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@', line)
if m:
cur = int(m.group(1))
continue
if cur is None:
continue
if line.startswith('+') and not line.startswith('+++'):
print(cur)
break
if line.startswith('-') and not line.startswith('---'):
continue
cur += 1
PY
)
echo "first_line=$first_line"
cat > "$tmpdir/body.md" <<EOF
Test execution plan placeholder for verification of API access.
EOF
body="$(cat "$tmpdir/body.md")"
set +e
gh api \
--method POST \
"repos/$repo/pulls/$pr/comments" \
-f body="$body" \
-f commit_id="$head_sha" \
-f path="$first_file" \
-F line="$first_line" \
-f side="RIGHT"
echo "inline_rc=$?"
set -eLength of output: 7925 🏁 Script executed: #!/bin/bash
set -euo pipefail
repo="RedHatQE/openshift-virtualization-tests"
pr="5347"
echo "== Full diff of test files across entire PR =="
for f in tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py tests/network/localnet/migration_stuntime/test_migration_stuntime.py; do
echo "--- $f ---"
gh api "repos/$repo/pulls/$pr/files?per_page=100" --jq '.[] | select(.filename=="'"$f"'") | .patch'
done
echo
echo "== Full diff of utilities/virt.py =="
gh api "repos/$repo/pulls/$pr/files?per_page=100" --jq '.[] | select(.filename=="utilities/virt.py") | .patch'
echo
echo "== Callers of set_template_affinity, wait_for_vmi_affinity, assert_affinity_after_migration =="
rg -n -C2 'set_template_affinity|wait_for_vmi_affinity|assert_affinity_after_migration' tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py tests/network/localnet/migration_stuntime/test_migration_stuntime.py
echo
echo "== smoke/gating markers globally =="
rg -l '`@pytest`\.mark\.smoke' tests || true
rg -l '`@pytest`\.mark\.gating' tests || trueLength of output: 15662 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "== Any smoke/gating test files call set_template_affinity? =="
rg -l 'set_template_affinity' tests || true
echo
echo "== Confirm set_template_affinity only used in stuntime tests =="
rg -n 'set_template_affinity' testsLength of output: 2926
CRITICAL: Re-ran the full workflow against the latest PR head ( The fallback issue-comment path is equally blocked by the same integration permission scope. This is a permissions issue on the GitHub App/integration token, not an analysis failure — write access to PR review comments/issue comments is required and currently absent. Updated analysis reflects the latest changes: Here is the exact comment body to post manually on **Test Execution Plan**
- **Run smoke tests: False** — no verified dependency path from any `@pytest.mark.smoke` test to `libs/vm/vm.py::BaseVirtualMachine.set_template_affinity()` / `_expected_vmi_affinity()`.
- **Run gating tests: False** — no verified dependency path from any `@pytest.mark.gating` test to the changed symbols.
- **Affected tests to run**
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py` (tier3, all cases use `set_template_affinity()`)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py` (tier3, all cases use `set_template_affinity()`)
- `tests/network/l2_bridge/rhel9_rhel10_cluster/test_connectivity.py` (also calls `set_template_affinity()`, now waits for VMI reconciliation and can raise `ValueError` on timeout)
- `tests/network/localnet/rhel9_rhel10_cluster/test_connectivity.py` (also calls `set_template_affinity()`, same reasoning)
These are the concrete affected paths because `set_template_affinity()` now blocks on VMI affinity reconciliation (`_expected_vmi_affinity`, retry wait_timeout=10s) before returning, and can raise a `ValueError` if the VMI never reconciles — every caller of this method is exposed to this new behavior.
**Real tests (cluster required)**
Error path (the fix — race window closed):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/localnet/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_off_server_node or test_server_migrates_off_client_node'`
Expected: `set_template_affinity()` blocks until VMI affinity matches the template before migration starts; no stale-affinity migration target pod.
Happy path (regression, non-stuntime caller):
`pytest -s -o log_cli=true tests/network/l2_bridge/rhel9_rhel10_cluster/test_connectivity.py`
Expected: `set_template_affinity()` calls still complete normally within the 10s retry window; no `ValueError` timeouts introduced. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5347 +/- ##
==========================================
- Coverage 98.67% 98.66% -0.02%
==========================================
Files 25 42 +17
Lines 2487 2467 -20
==========================================
- Hits 2454 2434 -20
Misses 33 33
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
EdDev
left a comment
There was a problem hiding this comment.
virt-controller and VMIM controller reconcile independently
I think you meant VM controller and VMIM controller ...
| expected_affinity = ( | ||
| asdict(obj=affinity, dict_factory=self._filter_out_none_values) if affinity else None | ||
| ) | ||
| self._expected_vmi_affinity(expected_affinity=expected_affinity) |
There was a problem hiding this comment.
Does it work by just using template_affinity (instead of expected_affinity)?
| localnet_stuntime_client_vm.set_template_affinity(affinity=new_pod_anti_affinity(label=SERVER_VM_LABEL)) | ||
| localnet_stuntime_client_vm.wait_for_vmi_affinity() | ||
| migrate_vm_and_verify(vm=localnet_stuntime_client_vm, client=admin_client) | ||
| if is_jira_open(jira_id="CNV-90576"): |
There was a problem hiding this comment.
I do not understand what is asked here.
I can just say that updating the affinity in order to control to where the VM will migrate to is a test thing. I do not expect users to use it, but they may... in any case we do not test it in this context.
What this PR does / why we need it:
When a VM template affinity is updated and migration is triggered immediately after, virt-controller and the VMIM controller race: the VMIM controller may read the VMI before the VM controller has reconciled the template change, creating the migration target pod with stale affinity rules. The VM ends up on the wrong node.
This PR adds
wait_for_vmi_affinity()toBaseVirtualMachine, which polls the VMI until its affinity matches the VM template — ensuring the VM controller has reconciled before migration is triggered. All stuntime tests (L2 bridge + localnet) now call it afterset_template_affinity.Additionally, temporary post-migration affinity assertions are added (gated behind
is_jira_open("CNV-90576")) to detect the race if it still occurs despite the wait. These assertions will be removed once CNV-90576 is resolved.The first commit removes a redundant
set_template_affinitycall fromtest_server_migrates_between_non_client_nodes— the preceding test already sets the same affinity and@pytest.mark.incrementalpreserves it.Which issue(s) this PR fixes:
Special notes for reviewer:
jira-ticket:
NONE
Summary by CodeRabbit